首页> 外文OA文献 >Term-class-max-support (TCMS): A simple text document categorization approach using term-class relevance measure
【2h】

Term-class-max-support (TCMS): A simple text document categorization approach using term-class relevance measure

机译:术语类别最大支持(TCMS):使用术语类别相关性度量的简单文本文档分类方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, a simple text categorization method using term-class relevance measures is proposed. Initially, text documents are processed to extract significant terms present in them. For every term extracted from a document, we compute its importance in preserving the content of a class through a novel term-weighting scheme known as TermClass Relevance (TCR) measure proposed by Guru and Suhil (2015) 1. In this way, for every term, its relevance for all the classes present in the corpus is computed and stored in the knowledgebase. During testing, the terms present in the test document are extracted and the term-class relevance of each term is obtained from the stored knowledgebase. To achieve quick search of term weights, B-tree indexing data structure has been adapted. Finally, the class which receives maximum support in terms of term-class relevance is decided to be the class of the given test document. The proposed method works in logarithmic complexity in testing time and simple to implement when compared to any other text categorization techniques available in literature. The experiments conducted on various benchmarking datasets have revealed that the performance of the proposed method is satisfactory and encouraging.
机译:本文提出了一种基于词类关联度的简单文本分类方法。最初,对文本文档进行处理以提取其中存在的重要术语。对于从文档中提取的每个术语,我们通过Guru和Suhil(2015)1提出的一种新颖的术语加权方案(称为术语类相关性(TCR)度量)来计算其在维护类内容中的重要性。术语,它与语料库中所有类的相关性被计算并存储在知识库中。在测试过程中,提取测试文档中存在的术语,并从存储的知识库中获取每个术语的术语类别相关性。为了快速搜索术语权重,已对B树索引数据结构进行了调整。最后,在术语-类别相关性方面获得最大支持的类别被确定为给定测试文档的类别。与文献中提供的其他任何文本分类技术相比,该方法在测试时间上的对数复杂度均有效,并且易于实现。在各种基准数据集上进行的实验表明,该方法的性能令人满意且令人鼓舞。

著录项

  • 作者

    Guru, D. S.; Suhil, Mahamad;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号